Model evaluation

Sampling Challenges

Elizabeth King
Kevin Middleton

This week

  1. Sampling challenges
  2. Regularizing priors, centering, and normalization
  3. Summarizing results
  4. Convenience packages

Folk theorem of statistical computing

According to Andrew Gelman (2008)

When you have computational problems, often there’s a problem with your model.

When sampling goes poorly, often there is a problem with the model:

  • Coding mistake or typo
  • Asking too much of the data
  • Poor prior specification (too wide or too narrow)

“Good” sampling

  • Traceplot looks like a fuzzy caterpillar
  • Rank histogram is mostly flat
  • Stan samples quickly
  • \(\widehat{R}\) is close to 1
  • Effective sample size is near the number of post-warmup samples

“Bad” sampling

  • Chains converge to different distributions
  • Rank histogram is wavy
  • Stan samples slowly (but also you might just have a lot of data)
  • \(\widehat{R}\) is >1
  • Effective sample size is < 100

Well-behaved samping

See lecture 2-4.

set.seed(467456)
(y <- rnorm(n = 80, mean = 50, sd = 2))
 [1] 44.88692 48.25894 48.44770 50.16844 49.55441 49.64544 48.49286 48.91631
 [9] 50.18508 53.58071 51.02436 50.97705 50.22559 52.74691 51.13053 52.56043
[17] 52.78518 51.04284 51.18040 50.38051 49.88772 49.00229 47.69577 52.15505
[25] 54.26835 49.64690 49.63739 52.25737 47.74531 51.14175 52.37880 50.37451
[33] 50.81945 52.62929 49.75200 50.66841 52.09346 45.51304 50.30989 48.95920
[41] 49.78614 48.94043 49.87016 52.75281 48.30467 49.73333 50.92056 52.59823
[49] 52.43258 50.08343 47.82025 46.46541 55.68929 50.39334 49.46525 49.26600
[57] 48.82329 49.49899 46.46622 48.76405 50.32061 49.12468 52.92599 51.42727
[65] 51.01789 53.63535 49.35317 52.96504 50.06350 49.25569 49.72310 49.79884
[73] 49.89066 49.14788 53.12937 49.11339 48.83087 47.81584 47.54861 46.55724
mean(y)
[1] 50.16057
sd(y)
[1] 2.002828

brm() model

priors <- 
  prior("normal(45, 10)", class = "Intercept") +
  prior("normal(0, 3)", class = "sigma", lb = 0)

fm <- brm(y ~ 1,
          data = list(y = y),
          prior = priors,
          chains = 4,
          iter = 1e4,
          refresh = 2000)

Traceplot

mcmc_trace(fm, pars = c("b_Intercept", "sigma"))

Rank histogram plot

mcmc_rank_overlay(fm, pars = c("b_Intercept", "sigma"))

brm summary

summary(fm)
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: y ~ 1 
   Data: list(y = y) (Number of observations: 80) 
  Draws: 4 chains, each with iter = 10000; warmup = 5000; thin = 1;
         total post-warmup draws = 20000

Population-Level Effects: 
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    50.16      0.23    49.71    50.60 1.00    17282    13194

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     2.03      0.16     1.74     2.38 1.00    17442    13195

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Poor sampling

  • Model is incorrect
  • Prior is too narrow, too broad, or misspecified
  • Multilevel (hierarchical / mixed) models
    • Many “random effects” with few points
  • Nonlinear models

Non-linear growth

Growth model

\[ Mass = \frac{a_1}{1 + \exp (-b_1 (age - c_1))} + \frac{a_2}{1 + \exp (-b_2 (age - c_2))}\]

  • two growth spurts
  • 6 parameters
  • 6 data points

brm model

priors <-
  prior(normal(9, 5), nlpar = "a1") +
  prior(normal(4, 5), nlpar = "a2") +
  prior(normal(0.5, 0.001), nlpar = "b1") +
  prior(normal(0.1, 0.001), nlpar = "b2") +
  prior(normal(10, 5), nlpar = "c1") +
  prior(normal(17, 5), nlpar = "c2")

fm <- brm(
  bf(Mass ~ a1 / (1 + exp(-1 * b1 * (Age - c1))) +
            a2 / (1 + exp(-1 * b2 * (Age - c2))),
     a1 + b1 + c1 + a2 + b2 + c2 ~ 1,
     nl = TRUE),
  data = D,
  prior = priors,
  backend = "cmdstan",
  chains = 4,
  cores = 4,
  iter = 1e4,
  refresh = 2000,
  seed = 347922,
  save_pars = save_pars(all = TRUE)
)

Warning

Warning: 2070 of 20000 (10.0%) transitions ended with a divergence.
See https://mc-stan.org/misc/warnings for details.

Traceplot

mcmc_trace(fm, regex_pars = "b")

Rank histogram plot

mcmc_rank_overlay(fm, regex_pars = "b")

brm summary

summary(fm)
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: Mass ~ a1/(1 + exp(-1 * b1 * (Age - c1))) + a2/(1 + exp(-1 * b2 * (Age - c2))) 
         a1 ~ 1
         b1 ~ 1
         c1 ~ 1
         a2 ~ 1
         b2 ~ 1
         c2 ~ 1
   Data: D (Number of observations: 6) 
  Draws: 4 chains, each with iter = 10000; warmup = 5000; thin = 1;
         total post-warmup draws = 20000

Population-Level Effects: 
             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
a1_Intercept     6.70      2.02     2.91    11.00 1.02      864     1227
b1_Intercept     0.50      0.00     0.50     0.50 1.00     1322     1644
c1_Intercept    14.62      1.55    11.13    16.86 1.01     1528     2413
a2_Intercept     7.52      3.41     0.18    13.72 1.01     1054     1192
b2_Intercept     0.10      0.00     0.10     0.10 1.00     2413     3242
c2_Intercept    19.15      4.70     9.63    28.04 1.00     1112     2808

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.55      0.38     0.19     1.58 1.01      259      118

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

brm model

priors <-
  prior(normal(9, 5), nlpar = "a1") +
  prior(normal(4, 5), nlpar = "a2") +
  prior(normal(0, 2), nlpar = "b1") +
  prior(normal(0, 2), nlpar = "b2") +
  prior(normal(10, 5), nlpar = "c1") +
  prior(normal(17, 5), nlpar = "c2")

fm <- brm(
  bf(Mass ~ a1 / (1 + exp(-1 * b1 * (Age - c1))) +
            a2 / (1 + exp(-1 * b2 * (Age - c2))),
     a1 + b1 + c1 + a2 + b2 + c2 ~ 1,
     nl = TRUE),
  data = D,
  prior = priors,
  backend = "cmdstan",
  chains = 4,
  cores = 4,
  iter = 1e4,
  refresh = 2000,
  seed = 984575,
  save_pars = save_pars(all = TRUE)
)

Traceplot

Rank histogram plot

brm summary

 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: Mass ~ a1/(1 + exp(-1 * b1 * (Age - c1))) + a2/(1 + exp(-1 * b2 * (Age - c2))) 
         a1 ~ 1
         b1 ~ 1
         c1 ~ 1
         a2 ~ 1
         b2 ~ 1
         c2 ~ 1
   Data: D (Number of observations: 6) 
  Draws: 4 chains, each with iter = 10000; warmup = 5000; thin = 1;
         total post-warmup draws = 20000

Population-Level Effects: 
             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
a1_Intercept     9.73      3.75     2.48    16.65 1.18       17       30
b1_Intercept     0.28      1.51    -4.82     2.81 1.14       38       35
c1_Intercept    11.93      3.93     2.68    17.83 1.13       34      102
a2_Intercept     4.56      5.36    -5.17    13.48 1.24       12      125
b2_Intercept     0.44      1.27    -2.69     3.17 1.09       69      233
c2_Intercept    17.41      4.33    10.22    26.17 1.09       33      525

Family Specific Parameters: 
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.60      0.46     0.09     1.85 1.07       53       31

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Warnings

Warning messages:
1: Parts of the model have not converged (some Rhats are > 1.05). Be careful when analysing the results! We recommend running more iterations and/or setting stronger priors. 
2: There were 3227 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup 

Folk theorem of Bayesian modeling with stan

When stan models fail, they often fail badly and with a lot of warning messages.

Possible solutions

  • Check for errors
  • Determine better priors (use your knowledge)
  • Redefine model (centering, standardizing)
  • Ask stan to work harder during warmup and sampling
control = list(adapt_delta = 0.99,
               max_treedepth = 15)